Session 3: Statistical modeling and machine learning
Department of Econometrics and Business Statistics
Food servers’ tips in restaurants may be influenced by many factors, including the nature of the restaurant, size of the party, and table locations in the restaurant. Restaurant managers need to know which factors matter when they assign tables to food servers. For the sake of staff morale, they usually want to avoid either the substance or the appearance of unfair treatment of the servers, for whom tips (at least in restaurants in the United States) are a major component of pay.
In one restaurant, a food server recorded the following data on all customers they served during an interval of two and a half months in early 1990. The restaurant, located in a suburban shopping mall, was part of a national chain and served a varied menu. In observance of local law the restaurant offered seating in a non-smoking section to patrons who requested it. Each record includes a day and time, and taken together, they show the server’s work schedule.
What is \(y\)? What is \(x\)?
Every person monitored their email for a week and recorded information about each email message; for example, whether it was spam, and what day of the week and time of day the email arrived. We want to use this information to build a spam filter, a classifier that will catch spam with high probability but will never classify good email as spam.
What is \(y\)? What is \(x\)?
A health insurance company collected the following information about households:
The health insurance company wants to provide a small range of products, containing different bundles of services and for different levels of cover, to market to customers.
What is \(y\)? What is \(x\)?
All (data-centric) models have a fitted values and residuals.
\[y = f(x_1, x_2, ..., x_p) + \varepsilon\]
where \(y\) is the observed response, \(x_1, ..., x_p\) are the observed values of \(p\) predictors and \(\varepsilon\) is the error. We conventionally use \(n\) to specfify the sample size.
Predictive accuracy
The primary purpose is to be able to predict \(\widehat{Y}\) for new data. And we’d like to do that well! That is, accurately.
Interpretability
Almost equally important is that we want to understand the relationship between \({\mathbf X}\) and \(Y\). The simpler model that is (almost) as accurate is the one we choose, always.
Person: Why did you predict 42 for this value?
Computer: Awkward silence
Parametric methods
Non-parametric methods
Black line is true boundary.
Grids (right) show boundaries for two different models.
If the model form is incorrect, the error (solid circles) may arise from wrong shape, and is thus reducible. Irreducible means that we have got the right model and mistakes (solid circles) are random noise.